Introducing the Prague Discourse Treebank 1.0

نویسندگان

  • Lucie Poláková
  • Jirí Mírovský
  • Anna Nedoluzhko
  • Pavlína Jínová
  • Sárka Zikánová
  • Eva Hajicová
چکیده

We present the Prague Discourse Treebank 1.0, a collection of Czech texts annotated for various discourse-related phenomena "beyond the sentence boundary". The treebank contains manual annotations of (1), discourse connectives, their arguments and senses, (2), textual coreference, and (3), bridging anaphora, all carried out on 50k sentences of the treebank. Contrary to most similar projects, the annotation was performed directly on top of syntactic trees (from the previous project of the Prague Dependency Treebank 2.5), benefiting thus from the linguistic information already existing on the same data. In this article, we present our theoretical background, describe the annotations in detail, and offer evaluation numbers and corpus statistics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...

متن کامل

Discourse Relations in the Prague Dependency Treebank 3.0

The aim of the demo is threefold. First, it introduces the current version of the annotation tool for discourse relations in the Prague Dependency Treebank 3.0. Second, it presents the discourse relations in the treebank themselves, including new additions in comparison with the previous release. And third, it shows how to search in the treebank, with focus on the discourse relations.

متن کامل

Use of Coreference in Automatic Searching for Multiword Discourse Markers in the Prague Dependency Treebank

The paper introduces a possibility of new research offered by a multi-dimensional annotation of the Prague Dependency Treebank. It focuses on exploitation of the annotation of coreference for the annotation of discourse relations expressed by multiword expressions. It tries to find which aspect interlinks these linguistic areas and how we can use this interplay in automatic searching for Czech ...

متن کامل

Genres in the Prague Discourse Treebank

We present the project of classification of Prague Discourse Treebank documents (Czech journalistic texts) for their genres. Our main interest lies in opening the possibility to observe how text coherence is realized in different types (in the genre sense) of language data and, in the future, in exploring the ways of using genres as a feature for multi-sentence-level language technologies. In t...

متن کامل

Annotation of Discourse Connectives for the Prague Dependency Treebank

The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013